Model-Free Gene Selection Method by Considering Unbalanced Samples
نویسندگان
چکیده
In gene expression data analysis, discriminator genes are importantly informative genes for further research. Recently, a great deal of research has focused on the challenging task of identifying these informative genes from microarray data. However, the sizes of sample classes in microarray data are often unbalanced. The unbalance of samples has not been explicitly and correctly considered by the existing gene selection methods, especially nonparametric methods. Considering the unbalance of samples and the stability of the approach for identifying informative genes, a novel and model-free gene selection method is proposed in this paper. With considering within-class difference and between-class variation, as well as the homogeneities of the within-class difference and between-class variations, scoring functions of genes are constructed to select discriminator genes. This method is not only applicable in two-category case but also applicable in multi-category case. The experimental results on two publicly available microarray datasets, leukemia data and small round blue cell tumor data, show that the proposed method is very efficient and robust to select discriminator genes.
منابع مشابه
Investigation of unbalanced magnetic force in permanent magnet brushless dc machines with diametrically asymmetric winding
The purpose of this paper is the calculation of Unbalanced Magnetic Force (UMF) in permanent magnet brushless DC (PMBLDC) machines with diametrically asymmetric winding and investigation of UMF variations in the presence of phase advance angle. This paper presents an analytical model of UMF in surface mounted PMBLDC machines that have a fractional ratio of slot number to pole number. This model...
متن کاملAsymmetric propagation based batch mode active learning for image retrieval
Relevance feedback is an effective approach to improve the performance of image retrieval by leveraging the labeling of human. In order to alleviate the burden of labeling, active learning method has been introduced to select the most informative samples for labeling. In this paper, we present a novel batch mode active learning scheme for informative sample selection. Inspired by the method of ...
متن کاملAUC-RF: A New Strategy for Genomic Profiling with Random Forest
Objective: Genomic profiling, the use of genetic variants at multiple loci simultaneously for the prediction of disease risk, requires the selection of a set of genetic variants that best predicts disease status. The goal of this work was to provide a new selection algorithm for genomic profiling. Methods: We propose a new algorithm for genomic profiling based on optimizing the area under the r...
متن کاملOptimal Placement and Sizing of Distributed Generations in Unbalanced Distribution Networks Considering Load Models and Uncertainties
Development of distributed generations’ technology, trends in the use of these sources to improve some of the problems such as high losses, low reliability, low power quality and high costs in distributed networks. Choose the correct location to install and proper capacity of these sources, such as important things that must be considered in their use. Since distribution networks are actu...
متن کاملAn Improved Information Gain Algorithm Based on Relative Document Frequency Distribution
Feature selection algorithm plays an important role in text categorization. Considering some drawbacks proposed from traditional and recently improved information gain(IG) approach, an improved IG feature selection method based on relative document frequency distribution is proposed, which combines reducing the impact of unbalanced data sets and low-frequency characteristics, the frequency dist...
متن کامل